Application of Different Learning Methods to Hungarian Part-of-Speech Tagging
نویسندگان
چکیده
From the point of view of computational linguistics, Hungar-ian is a diicult language due to its complex grammar and rich morphology. This means that even a common task such as part-of-speech tagging presents a new challenge for learning when looked at for the Hungarian language, especially given the fact that this language has fairly free word order. In this paper we therefore present a case study designed to illustrate the potential and limits of current ILP and non-ILP algorithms on the Hungarian POS-tagging task. We have selected the popular C4.5 and Progol systems as propositional and ILP representatives, adding experiments with our own methods AGLEARN, a C4.5 preprocessor based on attribute grammars, and the ILP approaches PHM and RIBL. The systems were compared on the Hungarian version of the multilingual morphosyntactically annotated MULTEXT-East TELRI corpus which consists of about 100.000 tokens. Experimental results indicate that Hun-garian POS-tagging is indeed a challenging task for learning algorithms, that even simple background knowledge leads to large diierences in accuracy , and that instance-based methods are promising approaches to POS tagging also for Hungarian. The paper also includes experiments with some diierent cascade connections of the taggers.
منابع مشابه
Lessons learned from tagging clinical Hungarian
As more and more textual resources from the medical domain are getting accessible, automatic analysis of clinical notes becomes possible. Since part-of-speech tagging is a fundamental part of any text processing chain, tagging tasks must be performed with high accuracy. While there are numerous studies on tagging medical English, we are not aware of any previous research examining the same fiel...
متن کاملسیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملApplication of AGLEARN for HungarianPart - of - speech
In this paper we present an application of the AGLEARN method to the part-of-speech (POS) tagging of Hungarian sentences. The task of the AGLEARN is to infer the semantic functions associated with the productions. In the learning process the grammar, background semantic functions and examples can be used. We applied the AGLEARN method to infer context rules to choose the correct tags. A corpus ...
متن کاملManually Annotated Hungarian Corpus
Current paper presents the results of a two-year project during which a consortium of the University of Szeged and the MorphoLogic Ltd. Budapest developed a morpho-syntactically parsed and annotated (disambiguated) corpus for Hungarian. For morpho-syntactic encoding, the Hungarian version of MSD (MorphoSyntactic Description) has been used. The corpus contains texts of five different topic areas...
متن کاملUsing a morphological analyzer in high precision POS tagging of Hungarian
The paper presents an evaluation of maxent POS disambiguation systems that incorporate an open source morphological analyzer to constrain the probabilistic models. The experiments show that the best proposed architecture, which is the first application of the maximum entropy framework in a Hungarian NLP task, outperforms comparable state of the art tagging methods and is able to handle out of v...
متن کامل